Language Independent Statistical Software for Corpus Exploration

نویسندگان

  • John Sinclair
  • Oliver Mason
  • Jackie Ball
  • Geoff Barnbrook
چکیده

In this report two programs for statistical analysis of concordance lines are described. The programs have been developed for analysing the lexical context of a given word. It is shown how different parameter settings influence the outcome of collocational analysis, and how the concept of collocation can be extended to allow the extraction of lines typical for a word from a set of concordance lines. Even though all the examples are for English, the software is completely language independent and only requires minimal linguistic resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trameur: A Framework for Annotated Text Corpora Exploration

Corpus resources with complex linguistic annotations are becoming increasingly important in the work of language specialists. They often need to perform extensive corpus research, including Natural Language Processing (NLP), statistical modelling and data visualisation. Our software system, called Trameur, aims at making these analyses possible within a single graphical user interface. It relie...

متن کامل

Proof Mining with Dependent Types

Several approaches exist to data-mining big corpora of formal proofs. Some of these approaches are based on statistical machine learning, and some – on theory exploration. However, most are developed for either untyped or simply-typed theorem provers. In this paper, we present a method that combines statistical data mining and theory exploration in order to analyse and automate proofs in depend...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

Interactive Part-of-Speech Exploration

We discuss the design of a tool for the interactive exploration of part-of-speech classes using structural features. At the heart of the tool are incremental hierarchical clustering algorithms. The algorithms are used to detect classes using morphological and syntactical features. The algorithms have been modified or designed to allow interactive exploration and constrained clustering. We prese...

متن کامل

Language-independent exploration of repetition and variation in longitudinal child-directed speech: a tool and resources

We present a language-independent tool, called Varseta, for extracting variation sets in child-directed speech. This tool is evaluated against a gold standard corpus annotated with variation sets, MINGLE-3-VS, and used to explore variation sets in 26 languages1 in CHILDES-26-VS, a comparable corpus derived from the CHILDES database. The tool and the resources are freely available for research.2

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers and the Humanities

دوره 31  شماره 

صفحات  -

تاریخ انتشار 1997